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ABSTRACT 

A comprehensive six-year longitudinal study of the 
development of reading skills during the primary grades for a large 
sample of bilingual (Spanish- English) children and smaller samples cf 
monolingual (English or Spanish) chijldren is outlined at its 
midpoint. In this natural variat ioti/study , approximately 350 children 
taught by 200 teachers in 20 schools in six districts are tracked 
through- the primary years. Their reading development and mastery of 
formal language is examined in detail each year through multiple 
measures, as is their instruction, through an array of indices, 
including classroom observations made throughout each academic year. 
In addition, information about the teachers' background, training, 
and language skills is gathered. Data available at this stage of the 
study, from a subsample of 63 children in grades 1-3, on several of 
the components of an interactive reading assessment in English and 
Spanish are analyzed and presented in detail, including charts of 
average growth and performance profiles for a variety of the measures 
used. (MSE) 
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ABSTRACT 

This paper presents an overview of a comprehensive six-year longitudi- 
nal investigation of the development of reading skills during ihe primary 
grades for a large sample of children from bilingual (Spanish-English) 
backgrou;.us, and for smaller samples of children who are monolingual in 
English or Spanish. In this "natural variation" study, approximately 350 
children taught by 200 teachers in 20 schools in six districts have been 
tracked through the early years of schooling. The study has carefully 
examined the children's development in language and reading on a yearly 
basis through multiple measures, coupled with an array of indices of the 
instruction received, in( .uding classroom observations mage throughout each 
academic year. In addition, information has been gathered about each 
teacher's background, training, and language skills. 

The paper also includes an exploratory analysis of growth in Spanish 
and English reading skill {along a number of dimensions as measured by a 
single instrument) in relation to the instruction received (as documented 
by the observation instrument) over grades 1 through 3 for a small, 
unrepresentation, subsample of the target students. 



Introduction 

Many children from second-language backgrounds have trouble learning 
to read in schools today, and many of these youngsters are from Spanish- 
language backgrounds, and are impoverished. The Bilingual Reading Study, 
now nearing completion at SEDL, is a compr;ehensi ve longitudinal investiga- 
tion of the development of reading skills during the primary grades for a 
representative sample of more than 250 Texas children from bilingual back- 
grounds, and for smaller samples of children who are monol ingual in English 
and in Spanish. In this ^'natural variation" study, teaching and learning 
have been carefully documented in field settings at several sites in order 
to (1) describe variations in both English and Spanish language competence 
for students living in bilingual communities; (2) document prevailing prac- 
tices in classroom Instruction for bilingual students; and (3) to allow a 
valid examination of the relations between instructional program and 
student achievement for students with differing entry profiles. 

We are currently in the middle stages of the longitudinal data 
analyses, and in this paper, we will only present data from- a subsample of 
the target students concerning their development in English and Spanish 
reading a'^ measured by a single in'strument in relation to the instructional 
program they have received as documented by the observation instrument^ over ' 
grades 1 through 3. The presentation will in general be non-technical, and 
is intended only as a sketch of the kinds of data collected with respect to 
reading development and instruction, with a glimpse at how the two are 
related. This relationship is not quantified here, but we are currently 
investigating a model capable of doing so via an index of the distance 
variance between the achievement and instructional profiles using standard- 
score transformations for each of the measures--a full complement of 
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technical reports will appear in December of 1984. We now turn to a 

'if 

general overview if the Bilingual Reading Study. 
An Overview of the Design of the Study 

To aci.ieve the objectives of the study, considerable attention was 
given to the selection of schools, teachers and students, to the instru- 
ments for assessing language and reading achievement, and to the methods 
for documentiag the classroom instruction. 

Schools, classes, anu teachers . Some 20 schools and 200 teachers have 
participated in the SEDL study, providing variations in the nature of the 
reading program (a range from phonits-oriented to mean,ing-based) , classroom 

St 

organization (some self-contained, others team-taught), and grade structure 
(the range of grades in tlie individual school and the extent of 
cross-grading both vary). The schools differ in size, SES, urbanicity, and 

makeup of the student body (from medium to high concentrations of bilingual 

I? 

students )^^ 

Student cohorts ^ The study has been undertaken in three cohorts or'" 
"waves" of students. The first sample drawn was small (N=4^n) and of 
limited generality; the second was somewhat larger (N=80) and covered a 
slightly broader array of contexts* The third sample was both larger 
(N=250) and broader in its generality, and incorporated a number of proce- 
dural .improvements based ofi expeh4ence gained fron work with the first two 
cohorts. ^ 

All of the bilingual sites are from the state of Texas; included in 
the sample are smaller cohorts that are either monolingual in English (fron 
the northern and central , Texas area) or in Spanish (from Chihuahua^ 
Mexico). Most students entered the study as kindergartners (the remaining 
students as first graders), and all will remain in the study through second 



grade, 40% of the sample through third grade, and 25% through fourth 
grade--a critical period for the development of literacy. 

Language assessment . Several types of data have been collected for 
each student on English and Spanish language proficiency. Each year, ea.rly 
in the Fall and late in both the Winter and Spring, we asked teachers to 
rate their students' language skills on a number of dimensions. We have 
also collected standardized oral language test data from Fall district-wide 
administrations. Finally, we have obtained recorded speech samples for 
'most students in three settings--the classroom, the playground, and the 
home. 

Reading assessments . Several instruments have been used to measure 

reading achievement. We have collected standardized^ reading, achievement 
& ' v 'J 

scores when available (mostly in English). More detailed information comes 
from a battery of individually-administered "performance-based" tests in 
both English and^ Spanish. In kindergartj;en or on entry to first grade, the 
Stanford Foundation Skills Test was employed to measure the child's pre- 
>eadi ng sk 1 1 1 s . From lhe""end 'of Ti"rst~grad Tnte rac t i ve ^Read i n g 

Assessment System was given during the Spring of each%ear; this instrument 
provides independent measures of a student's skill in decoding, word 
meaning, fluency in oral reading, and comprehension under listening and 
reading conditions. Finally, informal reading inventories were adminis- 
tered throughout the school year. 

Classroom observations and teacher interviews . Monthly observations 
of the reading instruction in each classroom have been made, and teachers 
have been interviewed quarterly about their rationale for the program of 
instruction. The observation instrument covers staffing, grouping and 
organization, time allocation, the language of instruction, the character 
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of instruction and the materials and procedures employed, and the response 
of the students* The interviews focused on the teacher's general instruc- 
tional objectives, as well as the objectives for individual target stu- 
dents. Together, these two instruments yield a rich characterization of 
the classroom environment for the target students. 

In summary, the database established for the target sample provides a 
relatively comprehensive picture over the primary grades of (1) the 
development of language and reading skills in both English and Spanish, and 
(2) the instruction received during this developmental sequence. The next 
step to be taken is to link these two data sets, and we now turn to the 
analysis which explores this linkage. 

The sample for the exploratory analysis discussed in this paper 
includes all bilingual students from Cohorts I and II who had completed 
third grade by the end of the fourth year of the study {Cohort I, N=36; 
Cohort'41, N=27J. The first cohort came from the South Texas -Mexico border 
area, while Cohort II was drawn from tht West Texas-Mexico border area. 
Both areas are rural, of low socioeconomic status, and have large numbers 
of Spanish-dominant students. 

Summarizing Progress in Reading * ^ 

A^^rimary purpose of the Bilingual Reading Study was the investigation 
of patterns of growth in reading achievement, and in the mastery of formal 
styles of language, of *>hich reading is a special instance. This discus- 
sion will begin with a presentation of the concept of the growth track as a 
summary of the acquisition of reading skills over time. This concept will 
be linked to the separable-process theory, and to the design of the 
Interactive Reading Assessment System (Cal f ee A Calfee, 1979, 1981; Calfee, 
Calfee, a Pe?fa, 1979), the instrument that will receive most attention 



here. Next we will present the aggregate sunmaries of the reading achieve 
ment of the two cohorts, followed by a discussion of a series of protocols 
for individual students. 
The Growth Track 

The RiUjUgual Reading Study adopted as a foundational assumption the 
notion that reading is a dynamic, developmental process, and thus it was 
necessary to tailor both the design and the data analysis to be sensitive 
to the character of changes in student performance over time--more specif i 
cally, to trends that occur over the four or five years that comprise the 
primary reading program. 

Although reading research (and educational research in general) has 
gi'ven little attention in recent years to the measurement of the course of 
learning (e.g., '*What is the shJpe of the learning curve?" is a question 
that seldom arises in educational research at present), instructional pro- 
grams still reflect this dilnfiension. For instance, basal reading materials 
are carefully graded to preseoi: the student with a set of learning mate- 
-r^fd+s~dnd-"^perl^ce^"^at-gr^^^ ty as the student 

moves through the program. 

The Interactive Reading Assessment System (IRAS) incorporates the 
developmental dimension of the basal reading series for ajl components of 
the separable-process design. For instance, the series of words at the 
beginning of the test, which the student is asked first to decode and then 
to define, is' V^ded by reference to several of the standard word counts 
used in basal -reader designs. The synthetic word lists for assessment of 
decoding in IRAS are ordered according to several factors known to affect 
difficulty of pronunciation as these are reflected in the typical scope- 
and-sequence charts. The sentences used for ass«ssnent of oral reading 
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incf'^ase in a regular fashion on the factors of length (nunbe.* of words) 
and syntactic cpmpl exity. The texts used to assess comprehension increase 
in overall length (number of words), propobitional "load" (Kintsch |< van 
Dijk, 197ft; for practical purposes, this factor is the number of distinc- 
tive idea?", and text structure (Calfee S Curley, 1979); expository texts 
of increasing formality are introduced at the second grade level in addi- 
tion to the narrative texts appearing at all levels. 

The construction of the materials in IRAS was graded with the aim of 
introducing a one-year increase for every two levels on the test.' Thus, 
success on Level .A for each of , the IRAS components should correspond more 
or less to the curriculum halfway through the end of first grade, success 
on Level B should identify a student who could handle the materials at the 
end of first grade, and so on. 

The design of IRAS into components and levels for each component was 
coupled with an informative, but efficient, technique for determining the 
student's proficiency level for each component--in essence, the technique 
was to locate as quickly as possible Jtwo critical levels for each compo-^ 
nent: the level at which the student did relatively well, and the level at 
which the* student did relativejly poorly. These two levels were usually 
adjacent to one another. 

The details of this strategy are described in the IRAS manual, but a 
couple of examples will help the reader who is not familiar with the 
instrument. The f>rst task for the student was to scan a series of graded- 
word lists, six words per list. The student was informed that the words 
increased in difficulty from one list to the next, and was asked to scan ' 
through the series until he or she encountered a list that was too diffi- 
cult to "read" (i.e., to decode). Virtually every student understood the 
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task without apparent difficulty, and most students quickly went about 
searching through the lists to find the limits of their nastery. -Once a 
selection was made, the student wuS asked to pronounce each word on the 
immediately preceding list.' If the student did reasonally well,- the next 
nore difficult list Wfes .presented for pronunciation; if the student did too 
poorly on the first list, the next easier list was presented. This 

■Si 

procedure was continued until failure was found for st-u^ents presented with 
more difficult lists than the initial one or until success was found for 
students presented with lists less difficult than the initial list 
presented. 

The second e;<;ampie is the comprehension task. The critical vocabulary 
level provides an estimate of the' level of text at which the student can 
read aloud with g- reasonable degree of fluency. This estimate needs to. be 
off only slightly to substantially increase the amount of testing time 
required for comprehension assessment--the vocabulary definition and the 
text comprehension tasks required a disproportionate amount of time because 
a free-response niode was employed. A more pr-ecise estimate of the critical 
text level, along with an erficient sample of oral readi- fluency, was 
gained by having each student read aloud a series of sentences of graded 
difficulty. Each sentence could be read by a proficient reader in less 
than 20 seconds. If a student made too many, mistakes or took too long on 
one of the sentence sets, the tester stopped the task H that level, and 
presented the narrative text at the next lower level for assessment of 
comprehensi on. 

The critical -Itvels technique generated two types of Infot^mation on 
each of the compv'^nent tasks. One measure was the student's hi ghest - levels 
of success , whf;re "level" refers to the IRAS levels described previously. 
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The s'feG^nd measure was an index of the quality of the response on the two 
critical levels--highest level of success, and the next level where perfor 
mance dropped beloW the critical value required for success. Throughout 
IRAS, these criteria were generally set quite low, so that if a student 
made 'correct' responses to half or more of the items contained in a given 
level of a task, this was considered success. 

In deriving a score for a given task, the quality of individual 
reponses within each l^vel attempted were first scaled for their degree of 
'correctness,' For example, in Definitions, a response providing a 
complete formal definition was assigned a score of '3'; a poor, but 
acceptable, definition was given a value of '2'; a correct mul ti-pl e-choice 
.response received a '1'; and an incorrect multiple-choice response was 
assigned a value of '0.' For purposes- of determining success at a given 
level, any item receiving a value above 0 was counted as a successful 
response.'* An index of the quality of response at a given level was formed 
by , calcul ating the proportion of points received relative to the total 
number'of possible points at that level. To summarize a student's 
performance on a given task, both leve.l and quality indices were included 
by taking the ordinal value of the level of highest success, and adding to 
it, the average of the quality indices at that level and the next level 
where 'performance was not successful (e.g., a student who passed level E 
with 75% of the total possible points, but failed level F with 25?', of the 
total possible points received a score of 5.5). 

As noted earlier, students were tested with both the English and 
Spanish versions of IRAS each Spring. The Spanish version was constructed 
to parallel the English version and was not simply a translation. Rather, 
the same principles used to ground the English version were followed in 
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building the Spanish version (e.g., Spanish word counts were employed to 
select appropriate vocabulary items for the decoding, definition, and 
comprehension tasks). The scales within the Spanish version were scored in 
the same fashion as those just described for the English version. 

The design of IRAS, together with the technique for determining a 
student's level of competence on the test, led to the postulation of an 
extremely simple model of growth over time. The model, shown in Figure 1, 
represents student growth over the years' of schooling as a straightline 
function. The correspondence with the grade level of the basal reading 
series is also displayed on the graph, along with boundary limits for 
progress that are one year above or below the expected level. The 
"typical" student, based on the instructional materials, should have 
trouble with the lowest level of IRAS in kindergarten, but should meet 
criterion on the second, fourth, and sixth levels of the test when exiting 
from the first, second, and third grades, respectively. 

The normative model shown in Figure 1 implies linear growth, as does 
the IRAS "levels" model, A linear model of progress has much to recomnend 
it, and to the extent that the design of the materials for IRAS has 
achieved this goal, the growth track for this test should yield data that 
are extremely easy to interpret. 

Analysis of average performance . This section describes the longitu- 
dinal average data for the subsample of 63 students^ drawn from Cohorts I 
and II on several (but not all) components of the Interactive Reading 
Assessment System in English and Spanish, 

Let us first examine performance on the Definitions task, the data for 
which are shown in Figure 2. The averages have been laid out according to 
the growth track model presented earlier. The English and Spanish results" 
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have been juxtaposed in the graph to permit an assessment of the degree of 
bi I ingual isvn in this sample of students. The Definitions task is based 
entirely on students' oral language skills— the target word is pronounced 
for the youngster, who is first asked to explain what the word means, and 
who is then prompted with a set of three alternatives if a suitable 
explanation is not forthcoming. This task, therefore, seems appropriate as 
one index of the level of competence in the spoken language and, at the 
upper levels of IRAS, of formal knowledge of the more complex concepts from 
each of the languages. 

The layout of the aggregate data on the growth track follows the same 
pattern for each of the graphs that follow, and thus d detailed discussion 
of Figure 2 is called for. Students were tested in the Spring of each 
school year, so that the data points in the figure represent performance on 
exit from the grade indicated on the abscissa. Additional information is 
available for precursor skills and language performance on entry to kinder- 
garten and at several other points along the growth track. These sources 
of data will be "added to the track" during subsequent analyses to provide 
a more complete representation of growth patterns in reading. For the 
present, the focus is on IRAS data at the end of first through third 
grades. The boldface symbols in the figure mark the averages for Defini- 
tion of English words (DE) and of Spanish words (DS). The vertical bars 
beside each of the averages indicate the extent of variability (one 
standard deviation on either side of the mean). 

As cautioned earlier, the averages in Figure 2, and elsewhere in this 
paper, are based upon a subsample of the first two cohorts only, a small 
and unrepresentative sample of the overall target student sample. With 
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this caution in mind, a few comments are nonetheless deserving of mention. 
First, as we suspected, students are able to define words that are norma- 
tively quite a bit more difficult than the readability limits in typical 
basal series, especially in the early primary grades, where the level of 
word knowledge Is almost two grade levels anead of the nomin,al limit. 

The second point to notice is that the averages for English and 
' Spanish are virtually identical for this sample. The reader should bear in 
mind that averages are seldom typical o^^individual profiles. The means in 
the figure could reflect a situation in which most of tiTe children are 
equally competent in Spanish and English. at all grades, or where half of 
the students are quite proficient in one of the languages and virtually 
deficient in the other— or a number of combinations of other patterns. 

Some idea of the behavior of individual students in first grade can be 
gained from the scatterplot in Figure 3 of English versus Spanish defini- 
tion performance. A preponderance of the students demonstrated substantial 
competence in both languages; 27 of the 63 scored above the third-grade 
level in both languages at the end of first grade. Another 13 children 
were quite capable in English, though not in Spanish, and the remaining 
students did poorly on the English test, with varying levels of capability 
on the Spanish test. Only two of the students were below t*ie first-grade 
level in both languages. 

The final matter to be noted is that the level of vocabulary profi- 
ciency increases in both lai juages from first through third grade, but at a 
rate that is substantially less than ''a-year-for-p-year," and at a somewhat 
slower rate for English than for Spanish. The students possess a rela- 
tively extensive vocabulary at the end of first grade (the typical student 
can define English words like x:ompany , crowd , and electri c ) . After thrc 
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IRAS-Spanish Vocabulary Definitions 
Figure 3. Scatterplot of IRAS Vocabulary Definition scores in English and Spanish for Grade 1. 



years of schooling they have made relay vely little additional progress 
through the [;ji:arrol 1 frequency list (Carroll, Davies, & Richman, 1971); they 
have reached words like committee , invisib.le , and mission , but have not 
learned to handle permanent , annual , and 1 iterature . 

One of the more significant features of Figure 2, for purposes of the 
methodology of this study, is the tendency for the average scores to 
increase in a straightline fashion over grades. The entry level is higher 
than the grade*et|uivalent model projects, but the reasons for this depar- 
ture from the model have been noted. The rate of growth over time is less 
than the simplest version of the model predicts; we suspect that this 
result may be a relatively accurate reflection of the actual effectiveness 
of vocabulary instruction as presented in the typical basal series. None- 
theless, to the extent that performance does undergo systematic change over 
the years of schooling, the data in Figure 2 suggest that aggregate changes 
for Vocabulary Definitions in both Englis)i and Spanish can be accounted for 
reasonably well by postulating a linear growth model --students tend by and 
large to move across the levels of IRAS at a constant rate over time. 

, Decoding is an important component in beginning reading according to 
many scholars (e.g., Chall , 1979). There Is considerable controversy about 
when and how phonics instruction should be introduced, but most basal 
systems have resolved this controversy by providing materials that permit 
an eclectic approach--phon1cs materials are made available for the teacher 
to us even in those series that stress comprehension from the earliest 
grades. 

Figure 4 displays the average level of performance and the amount of 
between-student variability for the two tasks designed to assess decoding 
skill in the English version of IRAS. The two measures tend to cross- 



15 

21 



4 



12 



10 



B 



in 0) 



7. or 



6.0 



5.0 



4.0 



3.0 



2.0 



0*- 1.0 



0.0 



SBOL BILINGUAL READING STUDY 
Preliminary Analysis of Sample 
From Initial Cohorts 




V: Vocabulary Decoding 
S: Synthetic Words Decoding 



mm- 



KINDERGARTEN 



EXIT ENTRY 



FIRST GRADE 



EXIT ENTRY 



SECOND GRADE 



EXIT EWlfiX- 



TUIRD GRADE 



ExjT mm. 



HXIT 



FOURTH GRADE 



O F1(«ure 4. Average growth track data for Vocabulary Decoding and Synthetic Word Decoding as measured by IRAS In English. 



ERJC.„ 



23 



validate each other in the aggregate--performance is at about the level 
expected from the design of the IRAS materials at the end of first grade, 
and increases at somewhat" more than a year-per-year over the following two 
years. By the end of third grade, students can pronounce real words at 
about the same level that they can define them, on the average. By the end 
of first grade, synthetic words can be pronounced if they conform to 
relatively simple consonant-vowel -consonant patterns. By second grade, the 
typical student can handle a variety of more complex patterns (consonant 
digraphs like SH «nd vowel -plus-R combinations), along with polysyllabic, 
words based on familiar combining forms (-ED, -ING, -FUL, UN-, and IM-). 
Third graders can manage the most complex Anglo-Saxon spellings on the list 
(KNOP, WRUOGE) and relatively simple polymorphic words (OACTURE, BEFADE), 
but are not able at the end of third grade to handle the complex Romance 
spellings (AFFREMIATION). 

It is important to note that the variability in decoding skills is 
quite substantial, especially at third grade. Some students are doing 
extraordinarily well, while others remain at a very low level. 

Figure 5 shows the results for English Reading and Listening Compre- 
hension (narrative texts only--analyses of the expository texts are forth- 
coming). In general, the ability of this sample of youngsters to handle 
connected text is much below the level that one might expect, judging from 
their performance on either the Definitions or the Decoding components. 
Performance is close to the grade-equivalent level at the end of first 
grade, but increases by only about one grade level between that time and 
the end of third grade. The students are better at listening comprehension 
than reading comprehension, as one might expect (problems with 'fluent' 
decoding would tend to cause difficulty for some students when reading on 

17 

24 

- i 



12 



0 

rH 

in di 
(U 

M ^ 



7.0 



6.0 



5.0 



A.0 



3.0 



2.0 



1.0 



0.0*- 

W H 

§> 
O h4 



SEDL BILINGUAL KBADING STUDY 
Frellminary Analysis of Sample 
From Initial Cohorts 



HI — 



mm 



EXIT miBL 



EXIT ENTKY 



EXIT EMm 




R: Reading Comprehension 
L: Listening Comprehension 



EXIT EHiM. 



Jim. 



O Figure 

ERIC 




KINDERGARTEN FIRST GRADE SECOND GRADE THIRD GRADE FOURTH GRADE 

5. Average growth track data for Reading Coinprelienslon and Listening Comprehension as measured by IRAS in Engll"sh. 

26 



their own), but both forms of comprehension exhibit roughly equivalent 
progress from first through third grade. The Listening Comprehension index 
is bounded—the upper limit is at level 7.0--which accounts for the rela- 
tively small amount of variability in this measure during third grade, and 
which may have unduly restricted the sensitivity of this performance 
measure for the more capable third graders. While the average in the 
figure is more than a full level bolow the limit, examination of the 
frequency distribu- 
tion revealed that half of the third-grade students had reached the 
top-most level in listening comprehension. The Reading Comprehension "^dex 
is well below the upper limit, and the observed level— a year below the 
expected level at third grade— is probably a trustworthy indication that ' 
the students are not performing . at grade level. It also appears that they 
are not performing up to their potential in comprehension given the level 
of their skills in vocabulary and decoding. 

We turn next to performance on the Spanish version of the IRAS. 
Decoding skill levels are shown in Figure 6. Both the Vocabulary and the 
Synthetic Word tasks complement one another, as was true for the English 
version. Performance is at the levels determined by the design of the test 
as appropriate to the students' grade level. The rate of change is not as 
rapid for Spanish as for English (Figure 4). 

Comprehension in Spanish is displayed in Figure 7. The patterns are 
similar in some respects to those for Engl ish— Listening superior to Read- 
ing, performance in both areas below the levels for decoding and defini- 
tional skills, and a slower rate of progress than , expected from the 
design. There is one noticeable difference between the English and Spanish 
plots: Reading Comprehension in Spanish is virtually negligible at the end 
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of first grade, and increases only slightly over the next two years. The 
students in this sample are ^ble to define Spanish vocabulary, and have 
learned to decode words when presented one at a time in isolation. They 
cannot handle connected text; fluent reading and the automaticity required 
for comprehension have not been acquired by this sample of students—not in 
Spanish. 

■ Analysis of individual protocols . The aggregat'e data presented above 
provide a characterization of the general trends in the data, but give 
little insight into the performance of individual students—these will be 
examined in this section. Four groups of students will be presented below, 
each group is distinctive from the other because of the se.quence of 
teachers who taught them from first through third grade, and the large 
variations in the instruction received from these teachers. 

The first two protocols, Figures 8 and 9, are for students in Group 
A. These two students excel on the Definitions task in both English and 
Spanish; they also do extremely well on both of the Decoding tasks (the low 
level for Student 0007 on the Synthetic Decoding Task in second grade 
appears to be an unexplainable outlier). Comprehension in English is at or 
above grade level for both students; the levels for Reading and Listening 
are fairly close. Comprehension in Spanish is lower than in English, 
varies over the two students, shows a tendency to increase from first to 
third grade, but is much below the levels of Def initi on and Decoding. 

The next protocol. Figure 10, is for a single student selected as 
representative of Group B. The level of performance on the Definition task 
appears relatively high at the end of first grade (actually, the student is 
at the average for all students), and remains constant after that. This 
pattern holds for both English and Spanish. Decoding skills and Comprehen- 
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sion performance increase over' the gr^ades, somewhat below the expected 
rate, but still a noticeable amount of growth. 

The protocols for the two students from Group C (Figures 11 and 12) 
may appear rather confusing at first glance. Student 2097 appears to have 
no command of English through, the end of second grade, based on the perfor- 
mance on the Definition talk, whereas Student 2082 is quite fluent in 
English by the end of first grade. Both students have a reasonable command 
of Spanish vocabulary by the end pf the first grade. The data on English 
Decoding are quite consistent for both youngsters— neither could decode 
anything on IRAS, whether familiar or synthetic words— through the end of 
second grade. Both students showed. marked improvement in their 'skills at 
decoding English words between the end of second grade and the end of third 
cade. Neither youngster had much success ir decoding Spanish thfough the 
end of second grade, but both showed some gain during third grade, espe- 
cially student 2097. Finally, both youngsters had considerable difficulty 
in comprehending spoken English' text at the end of first grade. Reading 
Comprehension in both languages was, negl igible through the end of second 
grade, with the two youngsters exhibiting variable degrees of success in 
English Listening Comprehension, and doing reasonably well in Spanish 
Listening. 

- Th.e patterns ar^ complex, but a plausible theme can be constructed. 
The students differ markedly in their entry language capability, and one 
might suspect that they differed on entry to school (a fact confirmed from 
the language measures collected). In any event, nothing happened during 
the first and second grade that al lowed either Student to attain mastery of 
reading skills of any sort in either language— both children remained 
illiterate after the first two years of primary-grade instruction. During 
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third grade, both students made substantial gains in English literacy, to 
the extent that they are reasonably to the grade level represented by 

the IRAS design. There is also some evidence of progress in Spanish 
literacy, slightly less than that observed in Efj^gllsh. 

Three stu'dents have been selected to represent Group D (Figures 13 to 
15). The first two students reflect a similar pattern of growth in English 
reading skills--frofn the end of first grade on, they are making satisfac- 
tory progress in all of the components of reading that are included in the 
.IRAS design, approximating or exceeding the levels indicated by IRAS as 
appropriate for their grade assignment. Their scores on the Spanish IRAS 
are also high, with the exception of the second grade Synthetic Decoding ^ 
score for student 0044, and the typically depressed pe^'formance in Reading 
Comprehension. 

The third student in the series shows a totally different pattern of 
performance. For both the English and Spanish scales, progress is apparent 
in the oral language skills, and in "sight word" vocabulary; the student is 
able to define most words in the IRAS series, can comprehend passages that 
are recited by the tester, and can pronounce familiar words appropriate to 
grade assignment. However, the youngster shows no evidence of having 
acquired any skills in' phonic analysis by the end of third. girade, and the 
ability to read and comprehend connected text is much ^elow grade level .• 
This student is obviously bilingual, given the level of isuccess on the 
Definition and Listening tasks, but shows no sign of acquiring literacy in' 
Spanish. 

In a later section of this paper, we will examine the sequence of 
instructional programs provided to the four groups of students whose read- 
ing performance has just been described. In general, there is a fair 
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amount of consistency in the performance of students within a given 
instructional sequence--this generalization is based upon an "eyeball" 
analysis of the sample of students selected for assessment in each school, 
and win be assessed by a more formal, forthcoming analysis. The patterns 
are often quite distinctive, suggesting that the teacher's decision to 
emphasize one aspect of literacy over another can have noticeable effects 
on student achievement. The range of patterns in the series of figures 
just presented does make a point that has been confirmed by a correlational 
analysi s--the components of reading as measured by IRAS do not exhibit the 
high degree of collinearity typical of most reading tests. A student may 
do quite poorly on one facet of IRAS, yet do cuite well on other facets. 

The relative independence of the IRAS components raises some problems 
for analysis, but the instrument also has the capability of assessing stu- 
dent responses to differential emphasis in the curriculum, a point that 
will be made more clearly as we report on tne instructional program in the . 
section which follows. Finally, though there is a fair degree of consis- 
tency in students' responses to instruction, individual differences are 
also observed, as seen in the three students in Group D. This point will 
be raised again in presentation of the instructional profiles. 
Observation of Classroom Instruction 

In this section m will describe the procedures used in bserving 
classroom instruction, and the preliminary analyses of this data set for 
the sample of teachers providing instruction for the subsample of students 
in Cohorts I and II (13 first-grade teachers, 8 second-grade teachers, and 
8 third-grade teachers). 

Reading and Mathematics Observation System . The reading instruction 
period of each classroom in the Bilingual Reading Study was observed an 
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average of five times during the year that a target student was enrolled in 
the class, with each observation lasting from 45 to 60 minutes. The 
Reading and Mathematics Observation System (RAMOS, Ca 'ee & Calfee, 1976) 
was the instrument used to record the observations. RAMOS is a real-time, 
categorical system, where for each of the instructional groups in the 
classroom, the observer notes the significant changes that take place over 
time in each of several categories considered to be significant indicators 
of effective instruction. The categories selected for this preliminary 
analysis include: ' 

• ROLE — tracked the teacher's involvement in direct teaching; from 
direct instruction, to discussion, to facilitation, to 
non-instructional engagement such as management or the preparation of . 
materials. 

• LANGUAGE OF INSTRUCTION at one end of this continuum, instruction 
was entirely in English, while at the other extreme only Spanish was 
used. 

m DECODING documented the relative curriculuin focus on decoding at a 
given moment from emphasis on analytic phonics skills (such as letter- 
sound recognition, spelling pattern recognition) to integrative' 
skills, such as whole word recognition, to non-decoding skills such as 
auditory discrimination, visual discrimination, and letter 
recognition. 

• COMPREHENSION documented the relative curriculun focus on compre- 
hension activities, from emphasis on major ideas and making infer- 

• i. 

ences, to literal facts, to vocabulary enrichment, to 
non-comprehension activities. 



• TECHNIQUE the instruction may have emphasized global features of a 
topic followed by analysis (whole-to-part), or begun with an analytic 
c+'-ategy followed by integration (part-to-whole). 

• TASK — the work assigned to. the student may have entailed activities 
directly related to the formal treatment of language (writing, 
discussing, listening, or reading), or it may have had no immediate 
relevance to formal language (handling art materials). 

• MATERIALS -- the materials for instruction could have been hooks or 
book-related media, or could have had no direct relevance to the print 
medium (art material, picture cards). 

0 PRODUCTIVITY — throughout the observation, the observer continuously 
rated the productivity of each group as high, medium, low, or none. 

• NOISE -- a judgment of the amount of noise for each group in the 
classroom was also made, again, from high to none. 

This abbreviated description is intended only as a sketch of the 
observation system; the categories available to the observer under eacii of 
the headings listed above were quite extensive, providing the observer with 
relatively concrete guides to the appropriate codes. 

■ Analysis of average scale values for RAMOS . In its original form, 
RAMOS resembles a narrative of the events in the classroom. A moment-to- ' 
moment classification of each event for each group is available in an 
abbreviated code, which can be read by an experienced observer, but which 
is not immediately "understandable" by the computer. Accordingly, a PASCAL 
program was prepared to convert the coded format into an expanded format, 
in which the codes for each group for every minute of observation were 
presented in a line-by-line record. This expanded record was then used to 
obtain (a) the average value for each category over t Tie for each group in 
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the classroom for each observation, (b) ^weighted averages taking group size 
into account for the classroom as a whole, and finally, (c) an average for 
each category for every teacher (collapsed over observations). These 
aggregate data are subject to the same limitations discussed earlier in 
connection with student achievement measures. 

The average scale values for each of the RAMOS categori- =ire shown in 
Figure 16 for the 29 first through third grade classrooms. . indicted in 
the figure, the classifications used by the observers for each category 
were arranged on a unidimensional numerical scale, generally ranging from 1 
to 9. The scaling was based on the judgment of the Laboratory staff and 
consultants experienced in classroom instruction. The figure shows that 
some of the categories changed little or not at all over grades (e.g.. Lan- 
guage), whereas other categories changed rather markedly (e.g., .Comprehen- 
sion, Task, and Materials). 

These averages are presented primarily as a frame of reference for the 
group protocols to be described shortly, and are of limited generality 
because of the restrictions on the sample. However, certain trends in the 
data deserve mention. Teachers adopted a role of direct instruction less 
than two-thirds of the time in these classes—the typical role was slightly 
more active than "facilitation," but not much more so. English was the 
predominant language at all grades. Decoding was not greatly in evidence, 
less than 20 percent of the time. Compreliension-1 ike activities were more 
common, especially in second and third grades. (The Decoding and Compre- 
hension scales can be added together for a rough estimate of the amount of 
emphasis on these two components of reading.) As can be seen under Focus, 
only about half of the time was spent with an emphasis on text-based 
instruction, with a noticeable increase from first to second grade. " The 
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next three scales also increase from first grade onward"the trend is 
toward more whole-to-part instruction, toward greater formality in the 
tasks presented to the students, and toward greater reliance on formal tex- 
tual materials. The final two panels indicate that students were reason- 
ably productive on the averaup, and that the noise level was judged to be 
slightly high in first and second grade, becoming quieter in third grade. 

As a caution, these average values reveal nothing about the trends 
over the school year, nor do they show anything about within-class varia- 
tions from group to group--these analyses will be forthcoming. The 
averages also conceal the differences between classrooms, which turn out to 
be rather substantial in some instances, as we shall see next. 

Analysis of instructional sequences . In Figure 17 are the observa- 
tional profiles for the four groups of students whose achievernent data were 
presented earlier in the report. For three of the groups, classroom data 
were available for all three grades; the second grade data for Group A have 
yet to be analyzed. Two subsets of scales— Performance/Noise and Focus/ 
Technique/Material?/Task--were moderately correlated in this data set, and 
have been combined by means of average standardized scores to simplify the 
exposition. 

We will now attempt to translate the instructional sequence for each 
group of students into descriptive prose. First grade for Group A entailed 
a high degree of direct instruction, concentration on English, an average 
level on the combined FTMT scale (first grade classrooms had an overall 
7-score of -.5 on this scale), aquiet, and productive environment, an 
average amount of time on decoding, an.d more than average emphasis on com- 
prehension. The overall picture is one of a well-managed classroom with a 
well defined focus on the acquisition of English literacy. Hata on the 
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second grade program are not available for this group of students, but the 
third grade program tends to follow a pattern similar to the first grade 
class, with a few noticeable differences. The third grade teacher relied 
less on direct instruction, resorted somewhat more often to Spanish, ;'nd 
gave more than usual emphasis to decoding. 

The instructional sequence for Group B reveals relatively less 
reliance on direct instruction in first-grade, with average levels there- 
after. Instruction was entirely in English; The FTMT scale, which pro- 
vides an index of the extent of formality in the program, is about average 
in first and third grades, with special emphasis in second grade. The 
classrooms were relatively quiet and productive, with moderate amounts of 
time devoted to instruction in decoding and comprehension. Overall, the 
program for Group B was well -managed and focused on reading instruction— an 
emphasis on English litf.Mdcy to the complete exclusion of instruction in 
Spani sh. 

Group C differs from the previous two groups in several respects. The 
instructional level varies from average (first and third grades) to low 
(second grade). Instruction was predominantly in Spanish during the first 
two grades, after which English was used in third grade. The FTMT scale 
was extremely low in- first grade, implying little or no emphasis on formal 
aspect? of language and text. This scale was at or above average in second 
and third grades. The class was noisy and unproductive in both of the 
first two grades, rising to an average level in third grade. The curricu- 
lum included attention to both decoding and comprehension at all grades. 
The overall picture shows a lack of coherence and management in the first 
two years, followed by a reasonably focused program in third grade. The 
low productivity and the high noise in first and second grades suggest poor 
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management. In first grade the teacher was moderately active instruction- 
ally, but gave little attention to the topic of literacy in either English 
or Spanish, The second grade teacher gave more emphasis to literacy, but 
played a relatively passive role in th<» classroom. Only in third grade 
were all the elements of an effective instructional program brought 
together. 

Finally, let us examine the pattern for Group 0, The primary years 
for the students in this sequence were characterized by a high level of 
direct instruction, exclusively in English, in classrooms that were produc- 
tive and relatively quiet, and with considerable emphasis on decoding and 
less attention to comprehension. The level of formal language in first 
grade was slightly below average, but increased sharply during the second 
and third grades. 

The patterns in Figure 17 are chiefly of interest to the degree that 
they can be related to student achievement, but two general reactions merit 
some attention. First, the sequence of instructional activities varies 
considerably from one group to another. Most research on teaching has 
focused on a single si ice-of-time in the life of the student—a day, week, 
month or school year. The data in Figure 17, to the degree that they are 
valid representations, suggest that the course of instruction for the, 
individual student may vary quite a bit from one year to the next, in ways 
that reflect little in the way of a coherent school -wide program. 

The second reaction is to the variations in the specifics of the pro- 
file from one classroom to another. The data in Figure 17 cannot be sum- 
marized by the contrast between "good" and "bad" classroom programs. The 
11 teachers represented in the figure (more precisely, the 11 teacher-year 
events), vary more or less independently on the set of dimensions incorpo- 
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rated in the RAMOS protocol. Again, this particular feature of the RAMOS 
instrument requires a more complicated plan of analysis, when contrasted 
with instruments that focus on one or two dimensions of classroom instruc- 
tion (e.g., instructional time, patterns of verbal interaction, or the 
like). The multidimensional character of the RAMOS data structure 
increases our confidence in the validity of the instrument, however, 
because it seems to, us a reasonable conjecture that the instructional pro- 
,grams of classroom teachers actually vary on more than a single evaluative 
dimension. 

Measuring the Linkage Between Instruction and Achievement 

Assessing the degree of correspondence between the complex patterns 
represented by Figures 8 to 15 and Figure 17 poses some interesting chal- 
lenges. On the one hand, an eyeball approach has much to recotmnend it- 
human beings are qui^ capable of perceivirrg complicated relations in the 
midst of noisy environments. On the other hand, there is much to recommend 
procedures that are quantifiable and technically reproducible. Our 
approach In the Bilingual Reading Study has been to rely on experienced 
judgment ta carry O-Ut preliminary evaluations, and to explore one or two 
methods for quantification. This work is still ongoing, and in this 
section we will only briefly describe the relations we are seeing. 

The general picture that emerges from the results is fairly simple— 
the target students tended to show higher levels of reading achievement 
when instruction emphasized the more formal aspects of language, and when 
the classrooms were well managed. Groups A and B illustrate some varia- 
tions on this theme, and the students within' these classroom programs 
appear quite competent in all components of English reading. Group A 
included some time for Spanish reading (most attention given to decoding, 
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it would appear, though the analyses done thus far do not provide evidence 
on this point), and the students showed the benefits, compared to the Group 
B student, who received no instruction in Spanish reading, and showed no 
gains in Spanish 1 '*'eracy. 

When the classroom gave less attention to formal reading, or when the 
teacher was unable to maintain control over the students, there was less 
evidence of learning. Students in Group, C were illiterate in both English 
and Spanish at the end of second grade. Despite three years of instruc- 
tion, they showed no command of print--for practical purposes, they had 
learned nothing about reading during the primary grades. Student 2082 
showed some gain in English listening comprehension during second grade, 
but otherwise both students performed like entering kindergartners *on exit 
from second grade. Both students were reasonably facile in Spanish; stu- 
dent 2097 was monolingual in Spanish on entry to kindergarten, and remai.ned 
so throught second grade. Inspection of the instructional program for these 
students shows little evidence of a coherent effort -to tedch reading in 
either language. While some time was allocated to decoding and comprehen- 
sion, books and the other "stuff" of reading were generally preseat, .the 
classroom was noisy and the students unproductive. 

The data for Group C show' another effect that seems to us deserying of 
emphasis--a^ formal' program of reading instruction can have positive effects 
on. reading achievement even for students who have not been exposed to such 
instruction in previous years. Students in Group C responded to the third- 
grade program of instruction as though they were "first-order Markov 
chains," to borrow a term from probability theory— both of the target stu- 
dents showed substantial gains in English decoding and comprehension during 
third grade, even though they had not been taught to read during the first 
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and second grades. To put it another way, these students showed no evi- 
dence of a cupiulative deficit, nor does it appear that they were unable to 
learn because they had missed a "critical period" in reading acquisition. 

The data from Group D reveal two trends of potential importance. 
These students were in a program that emphasized English decoding in first 
grade, followed by siecond-grade instruction that stressed comprehension in 
both languages. The achievetnent patterns for students Q044 and 0251 seem 
to reflect the instructional emphases-^decoding skills are at or .above 
expectation at the end of first grade, but comprehension is negligible; 
reading comprehension in English increases markedly during second grade, 
dyring wh'ich time student 0044 shows no further development of decoding 
skills, although student 0251 does show considerable grdwth in this area, 
In third grade, the curriculum emphasis shifts once more', from comprehen- 
sion back to decoding, and once again the changes in student achievement 
seem to mirror this' shift in relative emphasis. The. eclecticism of present 
practice makes it difficult in most Instances to draw sharp contrasts, but 
in this one instance the IRAS/RAMOS combination seems to be working as 
planned. 

The second point to be noted in the data for Group 0 is the difference 
between the response of student 0298 and the performance of the other two 
students. The observational ?lata in Figure 17 are averages over all the 
instructional groups in the classroom. While these patterns serve for an 
overall characterization, the actual program for individual students may 
depart significantly from the average for the classroom as a whole. The 
RAMOS data can be analyzed at two additional levels of refinement— the 
instructional group to which each target student is assigned, and depar- 
tures of the individual target student from the profile for the group. 
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Departures of the latter sort were fairly uncommon, but our informal review 
of the observational data suggests that there are substantial differences 
between reading groups within a class. It is clear that student 0298 
difVers markedly in achievement from the other two target students in Group 
0; our next step in accounting for such discrepancies will be to examine 
the instructional program for the group to which individual target students 
are assigned. 
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